Skip to content

ume: Add extensive documentation for user mode emulation #950

Merged
Jozott00 merged 6 commits into
masterfrom
wip/ume-documentation
Jun 3, 2026
Merged

ume: Add extensive documentation for user mode emulation #950
Jozott00 merged 6 commits into
masterfrom
wip/ume-documentation

Conversation

@arcane-quill

Copy link
Copy Markdown
Contributor

This PR covers issue #764 and adds extensive documentation for the user mode emulation, including examples and descriptions for various specification cases

@github-actions github-actions Bot added the docs Improvements or additions to documentation label May 11, 2026
@AndreasKrall

Copy link
Copy Markdown
Contributor

@arcane-quill The language tutorial is a tutorial for the user of VADL. It has to explain how to use UME and has to include some examples. Therefore, in the tutorial there should not be any text about the implementation. So delete everything starting with In order for this to work, custom VADL annotations and all the following lines.

Comment thread docs/refman/tutorial.md Outdated

@AndreasKrall AndreasKrall left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo, otherwise it is fine

@AndreasKrall AndreasKrall force-pushed the wip/ume-documentation branch from 51fceb6 to 479d259 Compare May 18, 2026 11:54
@AndreasKrall AndreasKrall marked this pull request as ready for review May 18, 2026 11:54
@AndreasKrall AndreasKrall enabled auto-merge (squash) May 18, 2026 11:55

@Jozott00 Jozott00 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some parts missing, e.g. syscall number matching and other kinds of register definitions. E.g. using non-ABI registers, such as the actual source registers from the ISA, like X(1). Or should it only be possible to use alias registers defined in the ABI?

Comment thread docs/refman/tutorial.md Outdated
@Jozott00 Jozott00 disabled auto-merge May 18, 2026 11:57
@Jozott00

Jozott00 commented May 18, 2026

Copy link
Copy Markdown
Contributor

An other thing I noticed: Typically syscalls are triggered by exceptions (which are triggered by instructions). And in QEMU the execution typically checks for a specific exception and state condition to determine whether a syscall was triggered or not.

Therefore, I think the syscall instruction is not really necessary and meaningful. I am not sure if there is any architecture that has a non-exception syscall trigger mechanism, but for now I would concentrate on exceptions.

Speaking of exceptions, the exception definition alone won't be sufficient to recognize a syscall. E.g. the sys/risc-v/rvcsr.vadl which defines the RV32Zicsr uses only a single exception definition to handle all kinds of exceptions. The actual exception code is given as argument to the exception:

enumeration ExcCode: ExcCodeSize =
  { ILLEGAL_INSTR = 0x02
  , BREAKPOINT    = 0x03
  , M_ECALL       = 0x0B
}

exception Exc(cause: ExcCodeSize) = {
// ...
}

This means the syscall trap mechanism must be a combination of an initial exception + condition check on some values. In case of RV32Zicsr it would be sufficient to have access to the cause argument, e.g. like this:

[ condition : cause = ExcCode::M_ECALL ]
syscall exception = Exc

However, it might be necessary to also have access to the CPU state to evaluate the condition. Unfortunately I don't have enough knowledge about different syscall mechanisms used in practice.
Also whether this should be an annotation like in the example above, or a more sophisticated definition depends on the potential complexity of such trigger mechanisms.

@AndreasKrall

Copy link
Copy Markdown
Contributor

An other thing I noticed: Typically syscalls are triggered by exceptions (which are triggered by instructions). And in QEMU the execution typically checks for a specific exception and state condition to determine whether a syscall was triggered or not.

It is the other way around. A syscall triggers a privileged exception changing the privilege level.

Therefore, I think the syscall instruction is not really necessary and meaningful. I am not sure if there is any architecture that has a non-exception syscall trigger mechanism, but for now I would concentrate on exceptions.

As it is the other way around, the syscall instruction does make sense and syscalls can be implemented without exceptions. Of course syscalls and exceptions are caught in privileged mode in a similar way. But in UME there is no privileged mode because the privileged mode is emulated. It is possible to implement UME without exceptions, but because the privileged exception handling is emulated and shared, they should be implemented together.

For exceptions you also have to distinguish between user mode exceptions and privileged exceptions. User mode exceptions can be caught by a user mode exception handler, privileged exceptions change to privileged mode. User mode exceptions are caught be an exception handler written in assembly language which is invoked from a certain address. I believe that only few of our VADL specifications have the user mode exception handling implemented. It should be defined in the processor section.

I did not look into the privileged instruction set of RISC-V to know the details. But exceptions are only implemented in a very rudimentary way in our VADL specifications.
Privileged mode is not defined at all. In the AArch64 spec there is some privilege level checking for access to system registers.

Speaking of exceptions, the exception definition alone won't be sufficient to recognize a syscall. E.g. the sys/risc-v/rvcsr.vadl which defines the RV32Zicsr uses only a single exception definition to handle all kinds of exceptions. The actual exception code is given as argument to the exception:

enumeration ExcCode: ExcCodeSize =
  { ILLEGAL_INSTR = 0x02
  , BREAKPOINT    = 0x03
  , M_ECALL       = 0x0B
}

exception Exc(cause: ExcCodeSize) = {
// ...
}

This means the syscall trap mechanism must be a combination of an initial exception + condition check on some values. In case of RV32Zicsr it would be sufficient to have access to the cause argument, e.g. like this:

[ condition : cause = ExcCode::M_ECALL ]
syscall exception = Exc

I do not think that a specification in such a way is useful. I have to think deeper about it and read the processor documentations.

However, it might be necessary to also have access to the CPU state to evaluate the condition. Unfortunately I don't have enough knowledge about different syscall mechanisms used in practice. Also whether this should be an annotation like in the example above, or a more sophisticated definition depends on the potential complexity of such trigger mechanisms.

@Jozott00

Copy link
Copy Markdown
Contributor

It is the other way around. A syscall triggers a privileged exception changing the privilege level.

The trigger mechanism I was talking about was on the QEMU implementation side. In QEMU the UME is an execution loop (similar to full system emulation), where each iteration execution checks for an interrupt or exception.
So in this loop body there is essentially a switch over all exceptions that have to be handled by the UME. One of them is handling the syscall call exception (in RISC-V ECALL) which will trigger the underlying syscall handling.

This means from the generator perspective, it doesn't matter which instruction causes the exception, it only matters which exception (under which condition) should be recognized as syscall. Therefore, the syscall instruction per se, has no relevance to the generated execution loop that looks for a user-defined exception for syscall handling.

As we control the whole generation, we could of course also handle syscalls by defining the syscall instruction and automatically throwing a synthetic exception in the QEMU implementation. However, this is way more implicit and does not allow as much control by the user (e.g. if the syscall is only triggered on some specific condition in the syscall instruction). And because (I think) most architectures are using exceptions for syscalls, I would concentrate on them and later add alternative mechanisms when we need them.

@Jozott00

Copy link
Copy Markdown
Contributor

@arcane-quill I had a meeting with @AndreasKrall to discuss some aspects of the UME VADL definition and QEMU generation.

We habe agreed on the following:

  • We use syscall instruction instead of a syscall exception as proposed above. This means, that under the hood, the UME generator will replace the instruction's behavior with a raise of a synthetic SYSCALL exception, which is used in the the cpu loop to recognize the syscall.
  • For now, we skip the exception behavior definition, as it is not yet clear how to define them in VADL. So the exception section in the docs can be stated as "work in progress".
  • The ABI definition is mandatory for the UME definition, as it contains the stack pointer. However, registers, used by the UME definition, such as syscall argument can be both, an alias register from the ABI and an actual register from the ISA definition.
  • For now all exceptions that are not the synthetic syscall exception lead to an UME abort.
  • We will need some kind of table in VADL to define the linux syscall numbers, as they may differ between architectures. I’ll take a closer look at the syscall.tbl files in the upstream QEMU code so that we can develop a suitable design for it.

@arcane-quill If anything is unclear, please let me know.

@arcane-quill

Copy link
Copy Markdown
Contributor Author

I was thinking about how to handle the syscalls, maybe we can use the specific vadl files to define the non-generic syscalls and then overwrite/add them to the existing table? Maybe we can put those existing syscalls in a separate vadl file to make it easier to generate everything correctly via the interpolation? @Jozott00

@Jozott00

Copy link
Copy Markdown
Contributor

I was thinking about how to handle the syscalls, maybe we can use the specific vadl files to define the non-generic syscalls and then overwrite/add them to the existing table?

You mean the syscall numbers right?
Yeah, I would define some kind of syscall table that can be extended, so one table is able to overwrite syscalls of the super table. And VADL provides a built-in generic table that has all generic syscalls already defined.
Something like

syscall table riscv64_syscalls extending VADL::GenericSysCallTable {
  // define syscall overwrites and add specific ones ...
} 

syscall table some_other_table extending riscv64_syscalls {
  // inherit riscv64 ones
}

However, I think we should not interpolate the existing table, but render the table from scratch based on the VADL definition.

Maybe we can put those existing syscalls in a separate vadl file to make it easier to generate everything correctly via the interpolation? @Jozott00

I’m not entirely sure what you mean by to make it easier to generate everything correctly via the interpolation, but generally, placing the syscall table in a separate VADL file and then importing it is probably the best approach. However, it’s not mandatory (it’s a user’s choice).

@Jozott00

Copy link
Copy Markdown
Contributor

I just saw your suggested approach in the documentation (with the enumeration). While an enumeration would also work, it has a drawback: it lacks a declaration side type, which prevents the LSP from recognizing it as a syscall table. Consequently, the LSP cannot provide autocompletion or on-the-fly diagnostics (at least in most cases).

I personally don't care that much about those features, but that is just a consideration. Of course, the benefit of such enumeration is that it’s already implemented in the frontend, so there is no need to add the syscall table definition to the language.

Also you have to check if there are syscalls that need more than just the number assignment. If that is the case, the enumeration is not a sufficient type.

@AndreasKrall if you are fine with enumerations for syscall tables, I am too.

@AndreasKrall

Copy link
Copy Markdown
Contributor

I just saw your suggested approach in the documentation (with the enumeration). While an enumeration would also work, it has a drawback: it lacks a declaration side type, which prevents the LSP from recognizing it as a syscall table. Consequently, the LSP cannot provide autocompletion or on-the-fly diagnostics (at least in most cases).

I personally don't care that much about those features, but that is just a consideration. Of course, the benefit of such enumeration is that it’s already implemented in the frontend, so there is no need to add the syscall table definition to the language.

Also you have to check if there are syscalls that need more than just the number assignment. If that is the case, the enumeration is not a sufficient type.

@AndreasKrall if you are fine with enumerations for syscall tables, I am too.

I suggest to start with the enumeration, it reuses existing constructs and we can expand it in the future, when we see that we have further requirements.

@Jozott00 Jozott00 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jozott00 Jozott00 force-pushed the wip/ume-documentation branch from 9902715 to 25e47ec Compare June 3, 2026 12:13
@Jozott00 Jozott00 enabled auto-merge (squash) June 3, 2026 12:13
@Jozott00 Jozott00 disabled auto-merge June 3, 2026 12:14
Comment thread docs/refman/tutorial.md
@Jozott00 Jozott00 merged commit 8ab6801 into master Jun 3, 2026
6 checks passed
@Jozott00 Jozott00 deleted the wip/ume-documentation branch June 3, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants